252 research outputs found
On Incomplete XML Documents with Integrity Constraints
Abstract. We consider incomplete specifications of XML documents in the presence of schema information and integrity constraints. We show that integrity constraints such as keys and foreign keys affect consistency of such specifications. We prove that the consistency problem for incomplete specifications with keys and foreign keys can always be solved in NP. We then show a dichotomy result, classifying the complexity of the problem as NP-complete or PTIME, depending on the precise set of features used in incomplete descriptions.
Bisimulations on data graphs
Bisimulation provides structural conditions to characterize indistinguishability from an external observer between nodes on labeled graphs. It is a fundamental notion used in many areas, such as verification, graph-structured databases, and constraint satisfaction. However, several current applications use graphs where nodes also contain data (the so called “data graphs”), and where observers can test for equality or inequality of data values (e.g., asking the attribute ‘name’ of a node to be different from that of all its neighbors). The present work constitutes a first investigation of “data aware” bisimulations on data graphs. We study the problem of computing such bisimulations, based on the observational indistinguishability for XPath —a language that extends modal logics like PDL with tests for data equality— with and without transitive closure operators. We show that in general the problem is PSPACE-complete, but identify several restrictions that yield better complexity bounds (CO- NP, PTIME) by controlling suitable parameters of the problem, namely the amount of non-locality allowed, and the class of models considered (graphs, DAGs, trees). In particular, this analysis yields a hierarchy of tractable fragments.Fil: Abriola, Sergio Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación En Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación En Ciencias de la Computacion; ArgentinaFil: Barceló, Pablo. Universidad de Chile; ChileFil: Figueira, Diego. Centre National de la Recherche Scientifique; FranciaFil: Figueira, Santiago. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación En Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación En Ciencias de la Computacion; Argentin
Context-Free Path Querying with Structural Representation of Result
Graph data model and graph databases are very popular in various areas such
as bioinformatics, semantic web, and social networks. One specific problem in
the area is a path querying with constraints formulated in terms of formal
grammars. The query in this approach is written as grammar, and paths querying
is graph parsing with respect to given grammar. There are several solutions to
it, but how to provide structural representation of query result which is
practical for answer processing and debugging is still an open problem. In this
paper we propose a graph parsing technique which allows one to build such
representation with respect to given grammar in polynomial time and space for
arbitrary context-free grammar and graph. Proposed algorithm is based on
generalized LL parsing algorithm, while previous solutions are based mostly on
CYK or Earley algorithms, which reduces time complexity in some cases.Comment: Evaluation extende
Separating Automatic Relations
We study the separability problem for automatic relations (i.e., relations on
finite words definable by synchronous automata) in terms of recognizable
relations (i.e., finite unions of products of regular languages). This problem
takes as input two automatic relations and , and asks if there exists a
recognizable relation that contains and does not intersect . We
show this problem to be undecidable when the number of products allowed in the
recognizable relation is fixed. In particular, checking if there exists a
recognizable relation with at most products of regular languages that
separates from is undecidable, for each fixed . Our proofs
reveal tight connections, of independent interest, between the separability
problem and the finite coloring problem for automatic graphs, where colors are
regular languages.Comment: Long version of a paper accepted at MFCS 202
Model Interpretability through the Lens of Computational Complexity
In spite of several claims stating that some models are more interpretable
than others -- e.g., "linear models are more interpretable than deep neural
networks" -- we still lack a principled notion of interpretability to formally
compare among different classes of models. We make a step towards such a notion
by studying whether folklore interpretability claims have a correlate in terms
of computational complexity theory. We focus on local post-hoc explainability
queries that, intuitively, attempt to answer why individual inputs are
classified in a certain way by a given model. In a nutshell, we say that a
class of models is more interpretable than another class
, if the computational complexity of answering post-hoc queries
for models in is higher than for those in . We
prove that this notion provides a good theoretical counterpart to current
beliefs on the interpretability of models; in particular, we show that under
our definition and assuming standard complexity-theoretical assumptions (such
as PNP), both linear and tree-based models are strictly more
interpretable than neural networks. Our complexity analysis, however, does not
provide a clear-cut difference between linear and tree-based models, as we
obtain different results depending on the particular post-hoc explanations
considered. Finally, by applying a finer complexity analysis based on
parameterized complexity, we are able to prove a theoretical result suggesting
that shallow neural networks are more interpretable than deeper ones.Comment: 36 pages, including 9 pages of main text. This is the arXiv version
of the NeurIPS'2020 paper. Except from minor differences that could be
introduced by the publisher, the only difference should be the addition of
the appendix, which contains all the proofs that do not appear in the main
tex
No Agreement Without Loss: Learning and Social Choice in Peer Review
In peer review systems, reviewers are often asked to evaluate various
features of submissions, such as technical quality or novelty. A score is given
to each of the predefined features and based on these the reviewer has to
provide an overall quantitative recommendation. However, reviewers differ in
how much they value different features. It may be assumed that each reviewer
has her own mapping from a set of criteria scores (score vectors) to a
recommendation, and that different reviewers have different mappings in mind.
Recently, Noothigattu, Shah and Procaccia introduced a novel framework for
obtaining an aggregated mapping by means of Empirical Risk Minimization based
on loss functions, and studied its axiomatic properties in the sense
of social choice theory. We provide a body of new results about this framework.
On the one hand we study a trade-off between strategy-proofness and the ability
of the method to properly capture agreements of the majority of reviewers. On
the other hand, we show that dropping a certain unrealistic assumption makes
the previously reported results to be no longer valid. Moreover, in the general
case, strategy-proofness fails dramatically in the sense that a reviewer is
able to make significant changes to the solution in her favor by arbitrarily
small changes to their true beliefs. In particular, no approximate version of
strategy-proofness is possible in this general setting since the method is not
even continuous w.r.t. the data. Finally we propose a modified aggregation
algorithm which is continuous and show that it has good axiomatic properties.Comment: preprint submitted to a conferenc
Semantic Optimization of Conjunctive Queries
This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k ≥ 1; the associated fragment is denoted GHWk. A CQ is semantically in GHWk if it is equivalent to a CQ in GHWk. The problem of checking whether a CQ is semantically in GHWk has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHWk under the constraints, while not being semantically in GHWk without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHWk, for a fixed k ≥ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHWk over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHWk alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHWk, we discuss how it can be approximated via a CQ that falls in GHWk in an optimal way. Such approximations might help finding “quick” answers to the input query when exact evaluation is intractable
- …